Goto

Collaborating Authors

 Manaus


Concept Map Assessment Through Structure Classification

arXiv.org Artificial Intelligence

Due to their versatility, concept maps are used in various educational settings and serve as tools that enable educators to comprehend students' knowledge construction. An essential component for analyzing a concept map is its structure, which can be categorized into three distinct types: spoke, network, and chain. Understanding the predominant structure in a map offers insights into the student's depth of comprehension of the subject. Therefore, this study examined 317 distinct concept map structures, classifying them into one of the three types, and used statistical and descriptive information from the maps to train multiclass classification models. As a result, we achieved an 86\% accuracy in classification using a Decision Tree. This promising outcome can be employed in concept map assessment systems to provide real-time feedback to the student.


Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview

arXiv.org Artificial Intelligence

Software testing and verification are critical for ensuring the reliability and security of modern software systems. Traditionally, formal verification techniques, such as model checking and theorem proving, have provided rigorous frameworks for detecting bugs and vulnerabilities. However, these methods often face scalability challenges when applied to complex, real-world programs. Recently, the advent of Large Language Models (LLMs) has introduced a new paradigm for software analysis, leveraging their ability to understand insecure coding practices. Although LLMs demonstrate promising capabilities in tasks such as bug prediction and invariant generation, they lack the formal guarantees of classical methods. This paper presents a comprehensive study of state-of-the-art software testing and verification, focusing on three key approaches: classical formal methods, LLM-based analysis, and emerging hybrid techniques, which combine their strengths. We explore each approach's strengths, limitations, and practical applications, highlighting the potential of hybrid systems to address the weaknesses of standalone methods. We analyze whether integrating formal rigor with LLM-driven insights can enhance the effectiveness and scalability of software verification, exploring their viability as a pathway toward more robust and adaptive testing frameworks.


CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection

arXiv.org Artificial Intelligence

Identifying vulnerabilities in source code is crucial, especially in critical software components. Existing methods such as static analysis, dynamic analysis, formal verification, and recently Large Language Models are widely used to detect security flaws. This paper introduces CASTLE (CWE Automated Security Testing and Low-Level Evaluation), a benchmarking framework for evaluating the vulnerability detection capabilities of different methods. We assess 13 static analysis tools, 10 LLMs, and 2 formal verification tools using a hand-crafted dataset of 250 micro-benchmark programs covering 25 common CWEs. We propose the CASTLE Score, a novel evaluation metric to ensure fair comparison. Our results reveal key differences: ESBMC (a formal verification tool) minimizes false positives but struggles with vulnerabilities beyond model checking, such as weak cryptography or SQL injection. Static analyzers suffer from high false positives, increasing manual validation efforts for developers. LLMs perform exceptionally well in the CASTLE dataset when identifying vulnerabilities in small code snippets. However, their accuracy declines, and hallucinations increase as the code size grows. These results suggest that LLMs could play a pivotal role in future security solutions, particularly within code completion frameworks, where they can provide real-time guidance to prevent vulnerabilities. The dataset is accessible at https://github.com/CASTLE-Benchmark.


Deep Learning-Based Transfer Learning for Classification of Cassava Disease

arXiv.org Artificial Intelligence

This paper presents a performance comparison among four Convolutional Neural Network architectures (EfficientNet-B3, InceptionV3, ResNet50, and VGG16) for classifying cassava disease images. The images were sourced from an imbalanced dataset from a competition. Appropriate metrics were employed to address class imbalance. The results indicate that EfficientNet-B3 achieved on this task accuracy of 87.7%, precision of 87.8%, revocation of 87.8% and F1-Score of 87.7%. These findings suggest that EfficientNet-B3 could be a valuable tool to support Digital Agriculture.


Application of Attention Mechanism with Bidirectional Long Short-Term Memory (BiLSTM) and CNN for Human Conflict Detection using Computer Vision

arXiv.org Artificial Intelligence

The automatic detection of human conflicts through videos is a crucial area in computer vision, with significant applications in monitoring and public safety policies. However, the scarcity of public datasets and the complexity of human interactions make this task challenging. This study investigates the integration of advanced deep learning techniques, including Attention Mechanism, Convolutional Neural Networks (CNNs), and Bidirectional Long ShortTerm Memory (BiLSTM), to improve the detection of violent behaviors in videos. The research explores how the use of the attention mechanism can help focus on the most relevant parts of the video, enhancing the accuracy and robustness of the model. The experiments indicate that the combination of CNNs with BiLSTM and the attention mechanism provides a promising solution for conflict monitoring, offering insights into the effectiveness of different strategies. This work opens new possibilities for the development of automated surveillance systems that can operate more efficiently in real-time detection of violent events.


An Experimental Study on Data Augmentation Techniques for Named Entity Recognition on Low-Resource Domains

arXiv.org Artificial Intelligence

Named Entity Recognition (NER) is a machine learning task that traditionally relies on supervised learning and annotated data. Acquiring such data is often a challenge, particularly in specialized fields like medical, legal, and financial sectors. Those are commonly referred to as low-resource domains, which comprise long-tail entities, due to the scarcity of available data. To address this, data augmentation techniques are increasingly being employed to generate additional training instances from the original dataset. In this study, we evaluate the effectiveness of two prominent text augmentation techniques, Mention Replacement and Contextual Word Replacement, on two widely-used NER models, Bi-LSTM+CRF and BERT. We conduct experiments on four datasets from low-resource domains, and we explore the impact of various combinations of training subset sizes and number of augmented examples. We not only confirm that data augmentation is particularly beneficial for smaller datasets, but we also demonstrate that there is no universally optimal number of augmented examples, i.e., NER practitioners must experiment with different quantities in order to fine-tune their projects.


InstCache: A Predictive Cache for LLM Serving

arXiv.org Artificial Intelligence

Large language models are revolutionizing every aspect of human life. However, the unprecedented power comes at the cost of significant computing intensity, suggesting long latency and large energy footprint. Key-Value Cache and Semantic Cache have been proposed as a solution to the above problem, but both suffer from limited scalability due to significant memory cost for each token or instruction embeddings. Motivated by the observations that most instructions are short, repetitive and predictable by LLMs, we propose to predict user-instructions by an instruction-aligned LLM and store them in a predictive cache, so-called InstCache. We introduce an instruction pre-population algorithm based on the negative log likelihood of instructions, determining the cache size with regard to the hit rate. The proposed InstCache is efficiently implemented as a hash table with minimal lookup latency for deployment. Experimental results show that InstCache can achieve up to 51.34% hit rate on LMSys dataset, which corresponds to a 2x speedup, at a memory cost of only 4.5GB. Recently Large Language Models (LLMs) as well as their multi-modal equivalents have become the essential driver of a new wave of technology innovation, revolutionizing every aspect of human life.


Was it Slander? Towards Exact Inversion of Generative Language Models

arXiv.org Artificial Intelligence

Training large language models (LLMs) requires a substantial investment of time and money. To get a good return on investment, the developers spend considerable effort ensuring that the model never produces harmful and offensive outputs. However, bad-faith actors may still try to slander the reputation of an LLM by publicly reporting a forged output. In this paper, we show that defending against such slander attacks requires reconstructing the input of the forged output or proving that it does not exist. To do so, we propose and evaluate a search based approach for targeted adversarial attacks for LLMs. Our experiments show that we are rarely able to reconstruct the exact input of an arbitrary output, thus demonstrating that LLMs are still vulnerable to slander attacks.


Automated Repair of AI Code with Large Language Models and Formal Verification

arXiv.org Artificial Intelligence

In contrast to classic software development, neural networks are crafted via a long process of trial and error that terminates when their predictive performance reaches a satisfactory level [4, 36]. The iterative and performance-driven nature of this process leaves neural networks vulnerable on many fronts [23]: poor performance on out-of-distribution [21] and adversarial inputs [32], misspecification of the neural architecture and training process [24], invocation of broken and deprecated libraries [30], outright software bugs [22]. Unfortunately, many of these vulnerabilities are not easy to catch early in the development process and may remain hidden until after deployment. Although efforts to debug the actual implementation of neural networks exist, they are based on automatic testing and thus cannot prove correctness for all inputs [33, 22, 18]. This lack of guarantees is especially concerning for safety-critical systems since common software vulnerabilities [11] (e.g., arithmetic overflows, invalid memory accesses) can make the networks produce wrong results, expose sensitive data or corrupt the system they are executed on. In this report, we tackle the challenge of producing bug-free implementations of neural networks in the following way. First, we employ software verifiers to ensure full coverage of the state space. In the past, it has been claimed that software verifiers struggle to cope with neural network code due to its size, complexity and the presence of numerous calls to the standard mathematical library [26].


Privacy-Aware Semantic Cache for Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) like ChatGPT and Llama2 have revolutionized natural language processing and search engine dynamics. However, these models incur exceptionally high computational costs. For instance, GPT-3 consists of 175 billion parameters where inference demands billions of floating-point operations. Caching is a natural solution to reduce LLM inference costs on repeated queries which constitute about 31% of the total queries. However, existing caching methods are incapable of finding semantic similarities among LLM queries, leading to unacceptable false hit-and-miss rates. This paper introduces MeanCache, a user-centric semantic cache for LLMs that identifies semantically similar queries to determine cache hit or miss. Using MeanCache, the response to a user's semantically similar query can be retrieved from a local cache rather than re-querying the LLM, thus reducing costs, service provider load, and environmental impact. Existing caching solutions for LLMs raise privacy and scalability concerns and perform wasteful query requests. MeanCache leverages Federated Learning (FL) to collaboratively train a query similarity model across LLM users without violating privacy. By placing a local cache in each user's device and using FL, MeanCache reduces the latency and costs and enhances model performance, resulting in lower false hit rates. MeanCache compresses the embedding dimensions to minimize cache storage and also finds the optimal cosine similarity threshold. Our experiments benchmarked against the state-of-the-art caching method, reveal that MeanCache attains an approximately 17% higher F-score and a 20% increase in precision during semantic cache hit-and-miss decisions. It also reduces the storage requirement by 83% and accelerates semantic cache hit-and-miss decisions by 11%.